NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

History-Guided Video Diffusion

Song, Kiwhan; Chen, Boyuan; Simchowitz, Max; Du, Yilun; Tedrake, Russ; Sitzmann, Vincent (July 2025, 2025 Forty-Second International Conference on Machine Learning)

Classifier-free guidance (CFG) is a key technique for improving conditional generation in diffusion models, enabling more accurate control while enhancing sample quality. It is natural to extend this technique to video diffusion, which generates video conditioned on a variable number of context frames, collectively referred to as history. However, we find two key challenges to guiding with variable-length history: architectures that only support fixed-size conditioning, and the empirical observation that CFG-style history dropout performs poorly. To address this, we propose the Diffusion Forcing Transformer (DFoT), a video diffusion architecture and theoretically grounded training objective that jointly enable conditioning on a flexible number of history frames. We then introduce History Guidance, a family of guidance methods uniquely enabled by DFoT. We show that its simplest form, vanilla history guidance, already significantly improves video generation quality and temporal consistency. A more advanced method, history guidance across time and frequency further enhances motion dynamics, enables compositional generalization to out-of-distribution history, and can stably roll out extremely long videos.
more » « less
Free, publicly-accessible full text available July 17, 2026
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

Chen, Boyuan; Martí_Monsó, Diego; Du, Yilun; Simchowitz, Max; Tedrake, Russ; Sitzmann, Vincent (September 2024, Neural Information Processing Systems 2024)

This paper presents Diffusion Forcing, a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels. We apply Diffusion Forcing to sequence generative modeling by training a causal next-token prediction model to generate one or several future tokens without fully diffusing past ones. Our approach is shown to combine the strengths of next-token prediction models, such as variable-length generation, with the strengths of full-sequence diffusion models, such as the ability to guide sampling to desirable trajectories. Our method offers a range of additional capabilities, such as (1) rolling-out sequences of continuous tokens, such as video, with lengths past the training horizon, where baselines diverge and (2) new sampling and guiding schemes that uniquely profit from Diffusion Forcing's variable-horizon and causal architecture, and which lead to marked performance gains in decision-making and planning tasks. In addition to its empirical success, our method is proven to optimize a variational lower bound on the likelihoods of all subsequences of tokens drawn from the true joint distribution.
more » « less
Full Text Available
Functional nanoporous graphene superlattice

https://doi.org/10.1038/s41467-024-45503-9

Lv, Hualiang; Yao, Yuxing; Yuan, Mingyue; Chen, Guanyu; Wang, Yuchao; Rao, Longjun; Li, Shucong; Kara, Ufuoma I; Dupont, Robert L; Zhang, Cheng; et al (December 2024, Nature Communications)

Abstract Two-dimensional (2D) superlattices, formed by stacking sublattices of 2D materials, have emerged as a powerful platform for tailoring and enhancing material properties beyond their intrinsic characteristics. However, conventional synthesis methods are limited to pristine 2D material sublattices, posing a significant practical challenge when it comes to stacking chemically modified sublattices. Here we report a chemical synthesis method that overcomes this challenge by creating a unique 2D graphene superlattice, stacking graphene sublattices with monodisperse, nanometer-sized, square-shaped pores and strategically doped elements at the pore edges. The resulting graphene superlattice exhibits remarkable correlations between quantum phases at both the electron and phonon levels, leading to diverse functionalities, such as electromagnetic shielding, energy harvesting, optoelectronics, and thermoelectrics. Overall, our findings not only provide chemical design principles for synthesizing and understanding functional 2D superlattices but also expand their enhanced functionality and extensive application potential compared to their pristine counterparts.
more » « less
Full Text Available
Fully body visual self-modeling of robot morphologies

https://doi.org/10.1126/scirobotics.abn1944

Chen, Boyuan; Kwiatkowski, Robert; Vondrick, Carl; Lipson, Hod (July 2022, Science Robotics)

A robot can learn full-body morphology via visual self-modeling to adapt to multiple motion planning and control tasks.
more » « less
Full Text Available
Force-Aware Interface via Electromyography for Natural VR/AR Interaction

https://doi.org/10.1145/3550454.3555461

Zhang, Yunxiang; Liang, Benjamin; Chen, Boyuan; Torrens, Paul M.; Atashzar, S. Farokh; Lin, Dahua; Sun, Qi (December 2022, ACM Transactions on Graphics)

While tremendous advances in visual and auditory realism have been made for virtual and augmented reality (VR/AR), introducing a plausible sense of physicality into the virtual world remains challenging. Closing the gap between real-world physicality and immersive virtual experience requires a closed interaction loop: applying user-exerted physical forces to the virtual environment and generating haptic sensations back to the users. However, existing VR/AR solutions either completely ignore the force inputs from the users or rely on obtrusive sensing devices that compromise user experience. By identifying users' muscle activation patterns while engaging in VR/AR, we design a learning-based neural interface for natural and intuitive force inputs. Specifically, we show that lightweight electromyography sensors, resting non-invasively on users' forearm skin, inform and establish a robust understanding of their complex hand activities. Fuelled by a neural-network-based model, our interface can decode finger-wise forces in real-time with 3.3% mean error, and generalize to new users with little calibration. Through an interactive psychophysical study, we show that human perception of virtual objects' physical properties, such as stiffness, can be significantly enhanced by our interface. We further demonstrate that our interface enables ubiquitous control via finger tapping. Ultimately, we envision our findings to push forward research towards more realistic physicality in future VR/AR.
more » « less
Full Text Available
Unsupervised Learning of Visual 3D Keypoints for Control

Chen, Boyuan; Abbeel, Pieter; Pathak, Deepak (July 2021, International Conference on Machine Learning)
null (Ed.)
Learning sensorimotor control policies from highdimensional images crucially relies on the quality of the underlying visual representations. Prior works show that structured latent space such as visual keypoints often outperforms unstructured representations for robotic control. However, most of these representations, whether structured or unstructured are learned in a 2D space even though the control tasks are usually performed in a 3D environment. In this work, we propose a framework to learn such a 3D geometric structure directly from images in an end-toend unsupervised manner. The input images are embedded into latent 3D keypoints via a differentiable encoder which is trained to optimize both a multi-view consistency loss and downstream task objective. These discovered 3D keypoints tend to meaningfully capture robot joints as well as object movements in a consistent manner across both time and 3D space. The proposed approach outperforms prior state-of-art methods across a variety of reinforcement learning benchmarks. Code and videos at https://buoyancy99.github. io/unsup-3d-keypoints/.
more » « less
Full Text Available
Unsupervised Learning of Visual 3D Keypoints for Control

Chen, Boyuan; Abbeel, Pieter; Pathak, Deepak (January 2021, International Conference on Machine Learning)

Learning sensorimotor control policies from high-dimensional images crucially relies on the quality of the underlying visual representations. Prior works show that structured latent space such as visual keypoints often outperforms unstructured representations for robotic control. However, most of these representations, whether structured or unstructured are learned in a 2D space even though the control tasks are usually performed in a 3D environment. In this work, we propose a framework to learn such a 3D geometric structure directly from images in an end-to-end unsupervised manner. The input images are embedded into latent 3D keypoints via a differentiable encoder which is trained to optimize both a multi-view consistency loss and downstream task objective. These discovered 3D keypoints tend to meaningfully capture robot joints as well as object movements in a consistent manner across both time and 3D space. The proposed approach outperforms prior state-of-art methods across a variety of reinforcement learning benchmarks. Code and videos at https://buoyancy99.github.io/unsup-3d-keypoints/
more » « less
Full Text Available
Visual Hide and Seek

https://doi.org/10.1162/isal_a_00269

Chen, Boyuan; Song, Shuran; Lipson, Hod; Vondrick, Carl (January 2020, The 2020 Conference on Artificial Life)

We train embodied agents to play Visual Hide and Seek to study the relationship between agent behaviors and environmental complexity. In Visual Hide and Seek, a prey must navigate in a simulated environment in order to avoid capture from a predator, only relying on first-person visual observations. By probing different environmental factors, agents exhibit diverse hiding strategies and even the knowledge of its own visibility to other agents in the scene. Furthermore, we quantitatively analyze how agent weaknesses, such as slower speed, affect the learned policy. Our results suggest that, although agent weakness makes the learning problem more challenging, they also cause more useful features to be learned.
more » « less
Full Text Available
Workspace Aware Online Grasp Planning

https://doi.org/10.1109/IROS.2018.8593644

Akinola, Iretiayo; Varley, Jacob; Chen, Boyuan; Allen, Peter K. (October 2018, Proceedings Article published Oct 2018 in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS))

This work provides a framework for a workspace aware online grasp planner. This framework greatly improves the performance of standard online grasp planning algorithms by incorporating a notion of reachability into the online grasp planning process. Offline, a database of hundreds of thousands of unique end-effector poses were queried for feasibility. At runtime, our grasp planner uses this database to bias the hand towards reachable end-effector configurations. The bias keeps the grasp planner in accessible regions of the planning scene so that the resulting grasps are tailored to the situation at hand. This results in a higher percentage of reachable grasps, a higher percentage of successful grasp executions, and a reduced planning time. We also present experimental results using simulated and real environments.
more » « less
Full Text Available

Search for: All records